A Multiplatform Chemometric Approach to Modeling of Mosquito Repellents
■177
variables to a smaller set of uncorrelated components. Then least squares regression is
performed on these new components, instead of the original data. ULR, as the simplest
method, is usually the first step in the regression modeling.
The linear modeling can be illustrated on the example of the QSAR modeling of repel-
lence index (Rindex) of the set containing several natural compounds (carvacrol, thymol,
cuminic acid, n-butyl cinnamate, ethyl cinnamate, benzyl benzoate, lauric acid) and some
newly synthesized compounds (Syn1 – KO5, Syn2 – KO10, Syn3 – KO2, Syn4 – KO3,
Syn5 – KO9, Syn6 – KO6, Syn7 – KO4, Syn8 – KO7, Syn9 – KO11, Syn10 – KO13,
Syn11 – KO12, Syn12 – KO8, Syn13 – KO16) (Thireou et al. 2018). The repellence in-
dices toward A. gambiae females of these compound was published in the study by Thireou
et al. 2018. The simplest model is the ULR model. It correlates Rindex with boiling point
(BP) of the compounds.
ULR : Rindex = 252.8713(±38.36695)−0.3041248(±0.05775952)BP
(9.1)
The MLR models correlate Rindex of the same group of the compounds with more than
one molecular descriptor. The MLR1 model predicts the Rindex based on critical pressure
(CP) and calculated molar refractivity (CMR):
MLR1 : Rindex
=
404.7651(±56.47803)−4.499053(±1.119779)CP−
−
36.69531(±5.468419)CMR
(9.2)
This model can be presented as 3D surface plot as it is given in Figure 9.3 so it can be
easily noticed what values of CP and CMR a compound should have to express desirable
Rindex.
The MLR2 model presents the relationship between Rindex and three independent vari-
ables: BP, total polar surface area (tPSA) and calculated lipophilicity descriptor (ClogP):
MLR2 : Rindex
=
248.8165(±37.2906)−0.480915(±0.06351143)BP+
(9.3)
+
1.853831(±0.4399472)tPSA+16.48189(±5.434551)ClogP
This model implies the significance of three molecular features that affect the repellence
ability of the studied series of compounds. Considering the highest value of the regression
coefficient in this model, the lipophilicity parameter (ClogP) has the greatest influence on
Rindex. The selection of the most suitable descriptors for the MLR models was carried out
by NCSS 2007 program by all possible regression routine from the set of the descriptors
that contained boiling point (BP), melting point (MP), critical temperature (CT), critical
pressure (CP), critical volume (CV), Gibbs energy (GE), lipophilicity (logP), molar refrac-
tivity (MR), total polar surface area (tPSA), calculated lipophilicity descriptor (ClogP) and
calculated molar refractivity (CMR). All the descriptors were calculated by ChemBioDraw
Ultra 13.0 program (PerkinElmer Inc.).
Although mathematically the simplest and easiest to interpret, ULR models often can-
not fully describe the dependence of the biological response on the molecular structure,